home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Turnbull China Bikeride
/
Turnbull China Bikeride - Disc 2.iso
/
STUTTGART
/
TEMP
/
GNU
/
flex
/
Matching
< prev
next >
Wrap
Text File
|
1995-06-28
|
4KB
|
84 lines
Matching
Previous: <Patterns=>Patterns> * Next: <Actions=>Actions> * Up: <Top=>!Root>
#Wrap on
{fH3}How the input is matched{f}
When the generated scanner is run, it analyzes its input
looking for strings which match any of its patterns. If
it finds more than one match, it takes the one matching
the most text (for trailing context rules, this includes
the length of the trailing part, even though it will then
be returned to the input). If it finds two or more
matches of the same length, the rule listed first in the
{fCode}flex{f} input file is chosen.
Once the match is determined, the text corresponding to
the match (called the {fStrong}token{f}) is made available in the
global character pointer {fCode}yytext{f}, and its length in the
global integer {fCode}yyleng{f}. The {fStrong}action{f} corresponding to the
matched pattern is then executed (a more detailed
description of actions follows), and then the remaining input is
scanned for another match.
If no match is found, then the {fUnderline}default rule{f} is executed:
the next character in the input is considered matched and
copied to the standard output. Thus, the simplest legal
{fCode}flex{f} input is:
#Wrap off
#fCode
%%
#f
#Wrap on
which generates a scanner that simply copies its input
(one character at a time) to its output.
Note that {fCode}yytext{f} can be defined in two different ways:
either as a character {fEmphasis}pointer{f} or as a character {fEmphasis}array{f}.
You can control which definition {fCode}flex{f} uses by including
one of the special directives {fEmphasis}%pointer{f} or {fEmphasis}%array{f} in the
first (definitions) section of your flex input. The
default is {fEmphasis}%pointer{f}, unless you use the {fEmphasis}-l{f} lex
compatibility option, in which case {fCode}yytext{f} will be an array. The
advantage of using {fEmphasis}%pointer{f} is substantially faster
scanning and no buffer overflow when matching very large
tokens (unless you run out of dynamic memory). The
disadvantage is that you are restricted in how your actions can
modify {fCode}yytext{f} (see the next section), and calls to the
{fEmphasis}unput(){f} function destroys the present contents of {fCode}yytext{f},
which can be a considerable porting headache when moving
between different {fCode}lex{f} versions.
The advantage of {fEmphasis}%array{f} is that you can then modify {fCode}yytext{f}
to your heart's content, and calls to {fEmphasis}unput(){f} do not
destroy {fCode}yytext{f} (see below). Furthermore, existing {fCode}lex{f}
programs sometimes access {fCode}yytext{f} externally using
declarations of the form:
#Wrap off
#fCode
extern char yytext[];
#f
#Wrap on
This definition is erroneous when used with {fEmphasis}%pointer{f}, but
correct for {fEmphasis}%array{f}.
{fEmphasis}%array{f} defines {fCode}yytext{f} to be an array of {fCode}YYLMAX{f} characters,
which defaults to a fairly large value. You can change
the size by simply \#define'ing {fCode}YYLMAX{f} to a different value
in the first section of your {fCode}flex{f} input. As mentioned
above, with {fEmphasis}%pointer{f} yytext grows dynamically to
accommodate large tokens. While this means your {fEmphasis}%pointer{f} scanner
can accommodate very large tokens (such as matching entire
blocks of comments), bear in mind that each time the
scanner must resize {fCode}yytext{f} it also must rescan the entire
token from the beginning, so matching such tokens can
prove slow. {fCode}yytext{f} presently does {fEmphasis}not{f} dynamically grow if
a call to {fEmphasis}unput(){f} results in too much text being pushed
back; instead, a run-time error results.
Also note that you cannot use {fEmphasis}%array{f} with C++ scanner
classes (the {fCode}c++{f} option; see below).